Skip to content

BUG:fix rolling skew kurt floating issue #18065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 8, 2017
Merged

BUG:fix rolling skew kurt floating issue #18065

merged 8 commits into from
Nov 8, 2017

Conversation

zartbot
Copy link
Contributor

@zartbot zartbot commented Nov 1, 2017

@TomAugspurger
Copy link
Contributor

Thanks, the failure on Travis is unrelated.

Could you add a note to whatsnew/v0.21.1.txt and a unit test (maybe using the example from #18044)?

@TomAugspurger TomAugspurger added this to the 0.21.1 milestone Nov 1, 2017
@TomAugspurger TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 1, 2017
@pep8speaks
Copy link

pep8speaks commented Nov 2, 2017

Hello @ogotaiking! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 07, 2017 at 13:51 Hours UTC

@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #18065 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18065      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45728       -9     
- Misses       4383     4392       +9
Flag Coverage Δ
#multiple 89.04% <ø> (ø) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...58299cb. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #18065 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18065      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50124    50124              
==========================================
- Hits        45742    45733       -9     
- Misses       4382     4391       +9
Flag Coverage Δ
#multiple 89.05% <ø> (ø) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc69dc6...ee5a33d. Read the comment docs.

@@ -57,6 +57,8 @@ Documentation Changes
Bug Fixes
~~~~~~~~~
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

producing numerical unstable results rather than NaN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behavior for skewness and kurtosis calculation is :
if the {Xi} distribution is a uniform distribution ( all values are equal), Thus the variance of the values( Sum(Xi-Xmean) ^2 is zero. so B should be zero in this case. But some numerically unstable issue cause B not to be zero.
so consider this situation, we assume if {Xi}'s variance is less than 1e-14, we treat it as uniform distribution.
based on previous behavior is return Nan. I don't want to break this behavior.
Some reference :
from mathematica, for uniform distribution skew/kurt return Nan
http://www.wolframalpha.com/input/?i=skewness%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D
http://www.wolframalpha.com/input/?i=kurtosis%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D

@@ -788,7 +788,7 @@ cdef inline double calc_skew(int64_t minp, int64_t nobs, double x, double xx,
A = x / dnobs
B = xx / dnobs - A * A
C = xxx / dnobs - A * A * A - 3 * A * B
if B <= 0 or nobs < 3:
if B <= 1e-14 or nobs < 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here about what we are checking (and below)

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

pls rebase on master as well.

@zartbot
Copy link
Contributor Author

zartbot commented Nov 6, 2017

@jreback for the comments for return Nan for numerically unstable issue[ seems lost the comments during rebase]

consider the uniform distribution {Xi}, X1=X2=....Xn-1=Xn, the variance sum(Xi-Xm)^2/n should be Zero, but some floating issue cause the variance is in range(0,1e-14), and the variance is in the denominator, that will cause skew/kurt result with an extremely large number. so we can treat if the variance is less than 1e-14, it's uniform distribution.

The original code only checks if the variance == Zero, and return Nan for uniform distribution, so I don't want to break this behavior and leave the result as Nan.

some reference :
http://www.wolframalpha.com/input/?source=nav&i=skewness%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D

http://www.wolframalpha.com/input/?source=nav&i=kurtosis%7B1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1,1.1%7D

result is Nan(undefined)

@jreback
Copy link
Contributor

jreback commented Nov 6, 2017

@ogotaiking not sure of your last, is this a different case?

@zartbot
Copy link
Contributor Author

zartbot commented Nov 7, 2017

@jreback same case. and also a reference from original skewness calculation code

pandas/core/nanops.py

def nanskew(values, axis=None, skipna=True):
    ...
    # floating point error
    m2 = _zero_out_fperr(m2) <- here m2 is eq B in rolling_skew
    m3 = _zero_out_fperr(m3)

    with np.errstate(invalid='ignore', divide='ignore'):
        result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5)

and the function _zero_out_fperr

_zero_out_fperr(arg):
    if isinstance(arg, np.ndarray):
        with np.errstate(invalid='ignore'):
            return np.where(np.abs(arg) < 1e-14, 0, arg)
    else:
        return arg.dtype.type(0) if np.abs(arg) < 1e-14 else arg

so here we treat the B <1e-14 is eq 0 in calc_skew and calc_kurt is following the original skew/kurt behavior.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

@ogotaiking ok I c. can you add update the comment in the code above to include a refernce to _zero_fperr (and maybe a reference in zero_fperr to calc_kurt/calc_skew), just so that future readers will know that these are connected. otherwise lgtm. ping when pushed / green.

# cause B != 0. and cause the result is a very
# large number.
#
# in core/nanops.py nanskew/nankurt call the function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the reference to _zero_fperr as well (back to here)

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

lgtm merge on green

@jreback jreback merged commit 93c755e into pandas-dev:master Nov 8, 2017
@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

thanks @ogotaiking

watercrossing pushed a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rolling skew and rolling kurt has wrong answer under all eq values
4 participants